A probabilistic generative model for GO enrichment analysis
نویسندگان
چکیده
The Gene Ontology (GO) is extensively used to analyze all types of high-throughput experiments. However, researchers still face several challenges when using GO and other functional annotation databases. One problem is the large number of multiple hypotheses that are being tested for each study. In addition, categories often overlap with both direct parents/descendents and other distant categories in the hierarchical structure. This makes it hard to determine if the identified significant categories represent different functional outcomes or rather a redundant view of the same biological processes. To overcome these problems we developed a generative probabilistic model which identifies a (small) subset of categories that, together, explain the selected gene set. Our model accommodates noise and errors in the selected gene set and GO. Using controlled GO data our method correctly recovered most of the selected categories, leading to dramatic improvements over current methods for GO analysis. When used with microarray expression data and ChIP-chip data from yeast and human our method was able to correctly identify both general and specific enriched categories which were overlooked by other methods.
منابع مشابه
Enriching Text Representation with Frequent Pattern Mining for Probabilistic Topic Modeling
Probabilistic topic models have been proven very useful for many text mining tasks. Although many variants of topic models have been proposed, most existing works are based on the bag-of-words representation of text in which word combination and order are generally ignored, resulting in inaccurate semantic representation of text. In this paper, we propose a general way to go beyond the bag-of-w...
متن کاملGenXHC: a probabilistic generative model for cross-hybridization compensation in high-density genome-wide microarray data
MOTIVATION Microarray designs containing millions to hundreds of millions of probes that tile entire genomes are currently being released. Within the next 2 months, our group will release a microarray data set containing over 12,000,000 microarray measurements taken from 37 mouse tissues. A problem that will become increasingly significant in the upcoming era of genome-wide exon-tiling microarr...
متن کاملProbabilistic acoustic tube: a probabilistic generative model of speech for speech analysis/synthesis
متن کامل
Synthesis of graphene oxide-TiO2 nanocomposite as an adsorbent for the enrichment and determination of rutin
Objective(s): In our study, graphene oxide-TiO2 nanocomposite (GO/TiO2) was prepared and used for the enrichment of rutin from real samples for the first time. Materials and Methods: The synthesized GO/TiO2 was characterized by X-ray diffraction, scanning electron microscopy, and FT-IR spectra. The enrichment process is fast and highly efficient. The factors including contact time, pH, and...
متن کاملBayesian Paragraph Vectors
Word2vec (Mikolov et al., 2013b) has proven to be successful in natural language processing by capturing the semantic relationships between different words. Built on top of single-word embeddings, paragraph vectors (Le and Mikolov, 2014) find fixed-length representations for pieces of text with arbitrary lengths, such as documents, paragraphs, and sentences. In this work, we propose a novel int...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 36 شماره
صفحات -
تاریخ انتشار 2008